\section{FAQ}

\textbf{Where do I find the Spark logs?}

\begin{itemize}
    \item \textbf{Standalone mode}: Spark executor logs are located in the directory {\lstinline[style=Bash]|$SPARK_HOME/work/app-<AppName>|} (where \texttt{<AppName>} is the name of your application). The location contains also stdout/stderr from H2O.
    \item \textbf{YARN mode}: YARN mode: The executor logs are available via the {\lstinline[style=Bash]|$yarn logs -applicationId <appId>|} command. Driver logs are by default printed to console, however, H2O also writes logs into \texttt{current\_dir/h2ologs}.
\end{itemize}

The location of H2O driver logs can be controlled via the Spark property \\ \texttt{spark.ext.h2o.client.log.dir} (pass via \texttt{--conf}) option.

\textbf{How can I display Sparkling Water information in the Spark History Server?}

Sparkling Water reports the information already, you just need to add the sparkling-water classes on the classpath of the Spark history server. To see how to configure the spark application for logging into the History Server, please see Spark Monitoring Configuration at
\url{http://spark.apache.org/docs/latest/monitoring.html}.

\textbf{Spark is very slow during initialization, or H2O does not form a cluster. What should I do?}

Configure the Spark variable \texttt{SPARK\_LOCAL\_IP}. For example:
       
\begin{lstlisting}[style=Bash]
export SPARK_LOCAL_IP='127.0.0.1'
\end{lstlisting}

\textbf{How do I increase the amount of memory assigned to the Spark executors in Sparkling Shell?}

Sparkling Shell accepts common Spark Shell arguments. For example, to increase the amount of memory allocated by each executor,
use the \\ \texttt{spark.executor.memory} parameter: {\lstinline[style=Bash]|bin/sparkling-shell --conf "spark.executor.memory=4g"|}

\textbf{How do I change the base port H2O uses to find available ports?}

H2O accepts the \texttt{spark.ext.h2o.port.base} parameter via Spark configuration
properties: {\lstinline[style=Bash]|bin/sparkling-shell --conf "spark.ext.h2o.port.base=13431"|}. For a complete list
of configuration options, refer to section~\ref{sec:properties}.

\textbf{How do I use Sparkling Shell to launch a Scala test.script that I created?}

Sparkling Shell accepts common Spark Shell arguments. To pass your script, please
use \texttt{-i} option of Spark Shell: {\lstinline[style=Bash]|bin/sparkling-shell -i test.script|}

\textbf{How do I add Apache Spark classes to Python path?}

Configure the Python path variable PYTHONPATH:
        
\begin{lstlisting}[style=Bash]
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-*-src.zip:$PYTHONPATH
\end{lstlisting}

\textbf{Trying to import a class from the hex package in Sparkling Shell but getting weird error:}

\begin{lstlisting}[style=Scala]
error: missing arguments for method hex in object functions; follow this method with '_' if you want to treat it as a partially applied
\end{lstlisting}

Please use the following syntax to import a class from the hex package:
\begin{lstlisting}[style=Scala]
import _root_.hex.tree.gbm.GBM
\end{lstlisting}


\textbf{Trying to run Sparkling Water on HDP Yarn cluster, but getting error:}

\begin{lstlisting}[style=Scala]
java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
\end{lstlisting}

The YARN time service is not compatible with libraries provided by Spark. Please disable time service via setting \\
\texttt{spark.hadoop.yarn.timeline-service.enabled=false}.
For more details, please visit \url{https://issues.apache.org/jira/browse/SPARK-15343}.

\textbf{How can I configure the Hive metastore location?}

Spark SQL context (in fact Hive) requires the use of metastore, which stores metadata about Hive tables. In order to
ensure this works correctly, the {\lstinline[style=Bash]|${SPARK_HOME}/conf/hive-site.xml|} needs to contain the following
configuration. We provide two examples, how to use MySQL and Derby as the metastore.

For MySQL, the following configuration needs to be located in the {\lstinline[style=Bash]|${SPARK_HOME}/conf/hive-site.xml|} configuration file:

\begin{lstlisting}[style=Bash]
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://{mysql_host}:${mysql_port}/{metastore_db}?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
</property>

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>{username}</value>
    <description>username to use against metastore database</description>
</property>

<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>{password}</value>
    <description>password to use against metastore database</description>
</property>
\end{lstlisting}

where:
\begin{itemize}
\item \{mysql\_host\} and \{mysql\_port\} are the host and port of the MySQL database.
\item \{metastore\_db\} is the name of the MySQL database holding all the metastore tables.
\item \{username\} and \{password\} are the username and password to MySQL database with read and write access to the \{metastore\_db\} database.
\end{itemize}

For Derby, the following configuration needs to be located in the {\lstinline[style=Bash]|${SPARK_HOME}/conf/hive-site.xml|} configuration file:

\begin{lstlisting}[style=Bash]
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby://{file_location}/metastore_db;create=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>org.apache.derby.jdbc.ClientDriver</value>
    <description>Driver class name for a JDBC metastore</description>
</property>
\end{lstlisting}

where:
\begin{itemize}
\item \{file\_location\} is the location to the metastore\_db database file.
\end{itemize}

\textbf{After conversion of Spark Data Frame to H2O Frame, I see only 100 columns on the console?}

If your Spark Data Frame has more than 100 columns, we don't treat it any different. We always fully convert
the Spark Data Frame to H2O Frame. We just limit the number of columns we send to the client as it's hard to read that
many columns in the console plus it optimizes the amount of data we transfer betweeen the client and backend.
If you wish to configure how many columns are sent to the client, you can specify it as part of the conversion method as:

\begin{lstlisting}[style=Python]
h2o_context.asH2OFrame(dataframe, "Frame_Name", 200):
\end{lstlisting}

The last parameter specifies the number of columns to sent for the preview.

\textbf{Getting exception about given invalid locale:}

When getting \texttt{java.lang.reflect.InvocationTargetException} via \texttt{java.lang.IllegalArgumentException} saying that \\
*YOUR\_SPARK\_ML\_STAGE* parameter locale given invalid value *YOUR\_LOCALE*. when using a Spark stage in my ML pipeline,
set the default locale for JVM of Spark driver to a valid combination of a language and country:

\begin{lstlisting}[style=Bash]
--conf spark.driver.extraJavaOptions="-Duser.language=en -Duser.country=US"
\end{lstlisting}
